Geospatial analytics hold tremendous potential to address complex problems facing society. The purpose of this study is to understand the spatial patterns of non-functional water points in Nigeria.
1.0 Introduction
Loading Packages
We will use the following packages:
sf: import geospatial datasets
tidyverse: manipulate aspatial data
spdep: compute spatial weights and autocorrelation
tmap: plot maps
funModeling: quick exploratory data analysis
Loading and Preparing Waterpoint Dataset
The water point data is collected by the Water Point Data Exchange (WPdx) whose goal is to improve water access to rural communities by providing data to enable data-driven decision making. The dataset can be found here and the data dictionary here. The data is in csv format with latitude and longitude information.
wp <-read_csv("data/WPdx_plus_Nigeria.csv")
Rows: 95008 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): #clean_adm1, #clean_adm2, #status_clean
dbl (2): #lat_deg, #lon_deg
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Due to the size of the dataset, it has already been pre-processed to keep only entries in Nigeria and some unused variables have been removed. The following code was used to pre-process the raw data file “Water_Point_Data_Exchange_-_Plus__WPdx__.csv” from the website but is not run on this page (raw data file is also not found on GitHub).
We can see that the variable names have some special characters which is not ideal. There are also a lot of NA values in the variable of interest (#status_clean). The following code chunk cleans the variable names and replaces the NA values with “Unknown”.
We can see that the categories are more detailed than we need. To study the proportion of functional and non-functional waterpoints, we can combine the categories into 3 categories: “Functional”, “Non-functional” and “unknown”
wp <- wp %>%mutate(status =case_when( status_clean %in%c("Abandoned/Decommissioned", "Abandoned","Non-Functional","Non functional due to dry season","Non-Functional due to dry season") ~"Non-functional", status_clean =="Unknown"~"Unknown", status_clean %in%c("Functional", "Functional but not in use","Functional but needs repair") ~"Functional" ))
Let’s visualise the proportions of functionality of the waterpoints. On the whole, only 55% of waterpoints are functional.
freq(wp, input="status")
Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
of ggplot2 3.3.4.
ℹ The deprecated feature was likely used in the funModeling package.
Please report the issue at <https://github.com/pablo14/funModeling/issues>.
status frequency percentage cumulative_perc
1 Functional 52148 54.89 54.89
2 Non-functional 32204 33.90 88.79
3 Unknown 10656 11.22 100.00
Now, I convert the aspatial data into geospatial point data from the latitude and longitude data using the st_as_sf() function. The original GCS of the data is WGS1984 (EPSG:4326) from the data dictionary. We need to project it to the EPSG:26391 CRS.
Now I have 2 geospatial datasets: a point dataset with waterpoint locations and a polygon data with administrative boundaries. I still need to count the number of waterpoints by status for each administrative area. Although the clean_adm2 variable in the waterpoint data should correspond to the administrative area the waterpoint belongs to, it is more accurate to use the GPS coordinates.
I will use the st_join() function to do a spatial join to relate the administrative area names to each waterpoint by its location. The join=st_intersects() argument tells R the type of spatial join to use.
Next, we need check if there are any missing values.
sum(is.na(wp_named$shapeID))
[1] 29
We can plot the points to visually check why these points are missing the administrative area name. Most likely it is because they fall outside any administrative area. If so, we can safely ignore these points. Setting tmap_mode("view") creates an interactive plot so we can zoom in to check the points. In addition, tm_dots() is used instead of tm_shape() this time so that the size of each point scales when zooming in on the interactive map.
We can see that these 29 points fall outside the boundary of Nigeria so we can exclude them.
Now let’s extract the number of waterpoints by status in each administrative boundary and join it to the administrative boundary layer. First, I need to remove the geometry data using the st_drop_geometry() function to manipulate it like a regular dataframe using tidyr and dplyr functions.
The next step is to group by administrative area name and status to generate the count. Lastly, we pivot from long to wide format for joining with the administrative boundary dataset. The values_fill=0 argument replaces any na values in the values_from variable with 0.
`summarise()` has grouped output by 'shapeID'. You can override using the
`.groups` argument.
head(prop, n=5) %>%kable() %>%kable_styling()
shapeID
Functional
Non-functional
Unknown
NGA-ADM2-72505758B10049836
31
39
0
NGA-ADM2-72505758B10063467
50
38
30
NGA-ADM2-72505758B10065661
104
51
23
NGA-ADM2-72505758B10302610
64
53
84
NGA-ADM2-72505758B11317593
51
49
0
Now, we use left_join() to relate the counts to the administrative boundary geospatial data. We also need to replace any na counts with 0 and add a new variable for total number of waterpoints.
The distribution of waterpoints across Nigeria does not appear to be evenly distributed. There are some small administrative areas with a high number of waterpoints (total and functional) in the north of Nigeria. There is one administrative area in the central west with a high number of waterpoints, but also a high number of non-functional waterpoints. Based on the distribution of waterpoints of unknown status, we can infer that the north of Nigeria is likely more developed because there are fewer waterpoints of unknown status; likewise, central Nigeria may not be as developed because there is a high number of water points of unknown or non-functional status.
The code chunk below plots total waterpoints using quantile breaks. We can also add a histogram to view the distribution of total waterpoints.
tm_shape(adm_wp)+tm_polygons("Total",style="quantile",palette="RdBu",legend.hist=TRUE) +tm_layout(main.title="Total Waterpoints in Nigeria",main.title.size=1.1,title.snap.to.legend=FALSE,legend.outside=TRUE,legend.hist.width =1.1)
From this map, we can see that the north-east and south of Nigeria tend to have fewer waterpoints. More than 60% of administrative areas have les than 200 waterpoints. As we do not know the population or water demand of each administrative area, it is difficult to say which areas are water stressed or need additional water infrastructure.
The spatial distribution of waterpoint supply shows some relation to the climate classification and population distribution. The south of Nigeria has higher rainfall which could mean less reliance on man-made waterpoints and thus fewer waterpoints.Area with higher population also tend to have more waterpoints.
Koppen-Geiger climate classification map for Nigeria. Source: Beck, H. E., et al. (2018)
Population density in Nigeria. Source: Wikimedia Commons
tm_shape(adm_wp)+tm_polygons("Functional",style="quantile",palette="RdBu",legend.hist=TRUE) +tm_layout(main.title="Total Functional Waterpoints in Nigeria",main.title.size=1.1,title.snap.to.legend=FALSE,legend.outside=TRUE,legend.hist.width =1.1)
tm_shape(adm_wp)+tm_polygons("Non-functional",style="quantile",palette="-RdBu",legend.hist=TRUE) +tm_layout(main.title="Total Non-Functional Waterpoints in Nigeria",main.title.size=1.1,title.snap.to.legend=FALSE,legend.outside=TRUE,legend.hist.width =1.1)
We can also plot it as proportions of total water points. First, we need to generate new variables for proportions.
The following plot shows the number of non-functional waterpoints out of total waterpoints by administrative area. We can see that there are some administrative areas where there some few waterpoints and most of them are non-functional.
ggplot(adm_wp) +geom_bar(aes(x=reorder(shapeID, `pNon-functional`, decreasing=TRUE), y=Total,fill="Total"),stat="identity") +geom_bar(aes(x=reorder(shapeID, `pNon-functional`, decreasing=TRUE), y=`Non-functional`,fill="Non-functional"),stat="identity",alpha=0.8) +scale_fill_manual(name="",values=c("red", "gray30")) +labs(title="Number of Water by Administrative Area",subtitle="(sorted by proportion of non-functional)",y="Number of waterpoints",x="Administrative Areas")+theme(axis.text.x=element_blank(),axis.ticks.x=element_blank(),axis.line.y=element_line(colour="grey50"))
We can see that many administrative areas in the north have more waterpoints and a higher proportion of functional water points. Many states in the south part of Nigeria have few waterpoints a high proportion of non-functional waterpoints. These areas are likely more water-stressed and efforts to restore waterpoints can be concentrated in those areas.
In addition, there are some administrative areas in north-east tip of Nigeria without any waterpoints at all